Skip to content

[Enhancement] (FE) Convert intra FE-to-FE calls to HTTPS when enabled#60921

Open
nsivarajan wants to merge 7 commits intoapache:masterfrom
nsivarajan:https-internal-fe-communication2
Open

[Enhancement] (FE) Convert intra FE-to-FE calls to HTTPS when enabled#60921
nsivarajan wants to merge 7 commits intoapache:masterfrom
nsivarajan:https-internal-fe-communication2

Conversation

@nsivarajan
Copy link
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Currently, Doris provides an enable_https switch to enforce HTTPS connections. In hardened deployments, HTTP is completely disabled by setting http_port = 0. However, intra FE-to-FE communication still relies on HTTP, causing failures in edit log and checkpoint synchronisation when HTTPS is enabled.

This PR enhances HTTPS support by automatically converting intra FE-to-FE communication to HTTPS when enable_https=true, without introducing any new configuration, leveraging mysql_ssl_default_ca_certificate created for https connection. When enable_https=false, behavior remains unchanged. This ensures secure and seamless FE-to-FE communication without breaking existing workflows.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@nsivarajan nsivarajan marked this pull request as ready for review February 28, 2026 19:18
@nsivarajan
Copy link
Contributor Author

run buildall

@nsivarajan
Copy link
Contributor Author

run buildall

@nsivarajan
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 28728 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 29f45dbe504c8d46be0de681c14f24d3a97c8291, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17684	4460	4289	4289
q2	q3	10648	764	517	517
q4	4680	362	249	249
q5	7562	1191	1015	1015
q6	178	176	149	149
q7	799	835	682	682
q8	9311	1426	1316	1316
q9	4802	4758	4692	4692
q10	6803	1857	1651	1651
q11	476	269	232	232
q12	718	566	470	470
q13	17774	4199	3412	3412
q14	245	236	212	212
q15	928	805	799	799
q16	770	716	681	681
q17	731	857	419	419
q18	5889	5411	5196	5196
q19	1178	964	601	601
q20	507	492	392	392
q21	4756	1996	1453	1453
q22	390	341	301	301
Total cold run time: 96829 ms
Total hot run time: 28728 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4762	4651	4592	4592
q2	q3	1828	2177	1772	1772
q4	879	1187	794	794
q5	4012	4336	4277	4277
q6	180	186	140	140
q7	1770	1634	1517	1517
q8	2488	2690	2508	2508
q9	7478	7402	7416	7402
q10	2622	2818	2368	2368
q11	533	459	437	437
q12	515	611	473	473
q13	4098	4792	3592	3592
q14	283	302	273	273
q15	868	841	802	802
q16	712	755	714	714
q17	1156	1670	1290	1290
q18	7183	6960	6655	6655
q19	931	1009	930	930
q20	2072	2193	1988	1988
q21	3960	3439	3280	3280
q22	462	432	373	373
Total cold run time: 48792 ms
Total hot run time: 46177 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 183423 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 29f45dbe504c8d46be0de681c14f24d3a97c8291, data reload: false

query5	4329	646	528	528
query6	324	213	197	197
query7	4207	462	270	270
query8	334	246	234	234
query9	8698	2764	2751	2751
query10	530	385	340	340
query11	16970	17297	17154	17154
query12	195	147	136	136
query13	1419	479	404	404
query14	6572	3408	3105	3105
query14_1	2910	2895	2898	2895
query15	207	203	176	176
query16	1001	491	473	473
query17	1425	731	640	640
query18	2524	451	356	356
query19	204	203	189	189
query20	150	133	132	132
query21	264	161	119	119
query22	5631	4998	4903	4903
query23	17149	16805	16500	16500
query23_1	16681	16661	16546	16546
query24	7280	1626	1240	1240
query24_1	1259	1257	1216	1216
query25	566	489	426	426
query26	1244	258	162	162
query27	2770	469	289	289
query28	4534	1897	1903	1897
query29	809	578	477	477
query30	318	247	207	207
query31	883	729	656	656
query32	81	73	71	71
query33	525	356	295	295
query34	945	928	564	564
query35	664	695	623	623
query36	1111	1127	929	929
query37	143	95	89	89
query38	2963	2886	2840	2840
query39	884	871	856	856
query39_1	838	836	836	836
query40	240	165	138	138
query41	68	64	63	63
query42	107	102	99	99
query43	376	393	349	349
query44	
query45	206	199	188	188
query46	884	991	612	612
query47	2105	2147	2036	2036
query48	322	345	245	245
query49	642	473	393	393
query50	697	294	220	220
query51	4086	4103	4062	4062
query52	107	111	98	98
query53	291	332	288	288
query54	312	280	301	280
query55	85	88	82	82
query56	349	323	338	323
query57	1375	1344	1250	1250
query58	297	323	271	271
query59	2554	2686	2487	2487
query60	320	336	315	315
query61	145	146	144	144
query62	631	582	547	547
query63	309	282	273	273
query64	4881	1251	965	965
query65	
query66	1453	459	351	351
query67	16657	16396	16348	16348
query68	
query69	394	306	282	282
query70	969	950	979	950
query71	343	306	291	291
query72	2797	2649	2430	2430
query73	543	548	318	318
query74	9998	9947	9763	9763
query75	2842	2746	2429	2429
query76	2290	1024	676	676
query77	366	368	304	304
query78	11093	11278	10626	10626
query79	2353	796	602	602
query80	1755	628	543	543
query81	548	274	242	242
query82	1015	146	114	114
query83	322	283	244	244
query84	268	111	101	101
query85	883	474	427	427
query86	411	299	303	299
query87	3097	3115	2953	2953
query88	3552	2683	2677	2677
query89	419	366	340	340
query90	1998	172	169	169
query91	164	159	137	137
query92	77	75	67	67
query93	991	846	507	507
query94	624	307	299	299
query95	585	403	305	305
query96	641	511	229	229
query97	2479	2481	2422	2422
query98	226	221	217	217
query99	1006	1015	923	923
Total cold run time: 255369 ms
Total hot run time: 183423 ms

@nsivarajan
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 29084 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 29f45dbe504c8d46be0de681c14f24d3a97c8291, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17606	4404	4317	4317
q2	q3	10651	792	533	533
q4	4675	369	255	255
q5	7553	1220	1009	1009
q6	176	178	149	149
q7	784	839	678	678
q8	9743	1467	1330	1330
q9	5243	4798	4744	4744
q10	6825	1896	1648	1648
q11	452	255	260	255
q12	763	576	482	482
q13	17767	4242	3421	3421
q14	243	232	213	213
q15	957	794	793	793
q16	774	739	680	680
q17	730	889	412	412
q18	5983	5317	5312	5312
q19	1466	982	618	618
q20	526	511	413	413
q21	4681	1982	1556	1556
q22	408	337	266	266
Total cold run time: 98006 ms
Total hot run time: 29084 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4725	4574	4697	4574
q2	q3	1819	2229	1779	1779
q4	861	1189	785	785
q5	4055	4394	4364	4364
q6	183	172	139	139
q7	1757	1629	1523	1523
q8	2490	2708	2625	2625
q9	7429	7430	7381	7381
q10	2630	2833	2400	2400
q11	526	433	408	408
q12	507	613	465	465
q13	4042	4584	3667	3667
q14	339	305	277	277
q15	835	799	810	799
q16	719	778	737	737
q17	1175	1564	1392	1392
q18	7295	6767	6768	6767
q19	940	1081	1007	1007
q20	2088	2196	2008	2008
q21	4092	3511	3473	3473
q22	531	448	419	419
Total cold run time: 49038 ms
Total hot run time: 46989 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184255 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 29f45dbe504c8d46be0de681c14f24d3a97c8291, data reload: false

query5	4327	636	512	512
query6	321	222	211	211
query7	4233	489	282	282
query8	359	255	235	235
query9	9079	2842	2836	2836
query10	512	428	363	363
query11	17107	17503	17564	17503
query12	201	132	129	129
query13	1269	491	360	360
query14	7404	3267	3012	3012
query14_1	2935	2969	2900	2900
query15	216	192	190	190
query16	1026	407	478	407
query17	1179	782	661	661
query18	2873	495	375	375
query19	215	229	191	191
query20	144	132	136	132
query21	217	136	121	121
query22	5997	5318	4970	4970
query23	17227	16838	16732	16732
query23_1	16798	16803	16807	16803
query24	7063	1635	1224	1224
query24_1	1249	1252	1233	1233
query25	547	481	428	428
query26	821	281	160	160
query27	2719	484	280	280
query28	4377	1860	1856	1856
query29	783	562	471	471
query30	311	235	208	208
query31	855	754	618	618
query32	81	71	68	68
query33	516	325	279	279
query34	883	915	572	572
query35	623	672	586	586
query36	1090	1148	972	972
query37	133	93	82	82
query38	3000	2907	2877	2877
query39	899	871	836	836
query39_1	825	836	832	832
query40	223	149	135	135
query41	66	59	58	58
query42	104	104	101	101
query43	369	391	353	353
query44	
query45	198	192	182	182
query46	924	986	603	603
query47	2161	2183	2067	2067
query48	317	312	236	236
query49	615	481	372	372
query50	679	276	215	215
query51	4076	4133	4008	4008
query52	108	108	97	97
query53	290	338	300	300
query54	282	267	259	259
query55	88	83	80	80
query56	343	287	305	287
query57	1359	1371	1283	1283
query58	288	277	284	277
query59	2608	2682	2503	2503
query60	335	330	314	314
query61	147	145	144	144
query62	641	591	550	550
query63	307	277	284	277
query64	4163	1255	1002	1002
query65	
query66	1352	450	358	358
query67	16237	16225	16247	16225
query68	
query69	393	312	281	281
query70	1006	962	963	962
query71	335	318	285	285
query72	2771	2678	2430	2430
query73	527	551	317	317
query74	10015	9921	9765	9765
query75	2854	2733	2455	2455
query76	2322	1029	696	696
query77	355	377	308	308
query78	11162	11329	10664	10664
query79	2678	791	605	605
query80	1780	621	576	576
query81	567	270	244	244
query82	1015	148	119	119
query83	348	260	243	243
query84	251	114	101	101
query85	870	485	439	439
query86	415	300	295	295
query87	3104	3101	2988	2988
query88	3510	2663	2656	2656
query89	427	368	344	344
query90	1951	167	168	167
query91	162	153	141	141
query92	74	77	69	69
query93	1090	823	507	507
query94	628	321	296	296
query95	594	395	315	315
query96	643	505	229	229
query97	2462	2480	2455	2455
query98	227	233	218	218
query99	1048	947	916	916
Total cold run time: 255584 ms
Total hot run time: 184255 ms

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 8.33% (6/72) 🎉
Increment coverage report
Complete coverage report

@Hastyshell
Copy link
Collaborator

/review

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This PR converts intra FE-to-FE HTTP communication to HTTPS when enable_https=true. The core approach is sound — centralizing URL construction in HttpURLUtil.buildInternalFeUrl() and adding SSL-aware HTTP clients in InternalHttpsUtils. However, I found several issues that should be addressed:

Critical Issues

  1. JVM-wide global state mutation (installTrustManagerForUrlConnection)setDefaultSSLSocketFactory() and setDefaultHostnameVerifier() modify JVM-wide defaults that affect ALL HttpsURLConnection instances in the process, not just FE-to-FE connections. This is called on every getConnectionWithNodeIdent() invocation, which is wasteful and thread-unsafe. Additionally, the hostname verifier (hostname, session) -> true completely disables hostname verification globally, which could affect other HTTPS connections in the JVM (e.g., external catalog connections).

  2. Resource leak in HttpUtils.executeRequest()CloseableHttpClient is created but never closed. The ResponseHandler variant of execute() closes the HTTP response, but NOT the client itself. This leaks connection pool resources. Other callers in the codebase (e.g., MysqlLoadManager, HFUtils) properly use try-with-resources.

  3. Incomplete conversion — Several FE-to-FE communication paths still hardcode http:// and Config.http_port, which will break when enable_https=true and http_port=0. Key missed locations:

    • HttpUtils.concatUrl() (line 75) — hardcodes http://
    • HttpUtils.getFeList() (line 65) — hardcodes Config.http_port
    • SessionController.getOtherSessionInfo() — hardcodes http://
    • NodeAction.concatFeSetConfigUrl() — hardcodes http://
    • NodeAction.getFeList() and handleConfigurationInfo() — hardcodes http:// and Config.http_port

Minor Issues

  1. Removed useful comments — Two comments about using Env.getServingEnv() instead of getCurrentEnv() were deleted. These documented a non-obvious requirement and should be preserved.

  2. Removed LOG.info in MetaHelper.doGet — The debug logging for URL, timeout, and headers was removed. This is useful for troubleshooting connectivity issues.

Checkpoints

  • Correctness: Issues found (global state mutation, incomplete conversion, resource leak)
  • Security: The approach of certificate validation + disabled hostname verification for internal communication is reasonable and documented, but the JVM-wide global mutation is a concern
  • Performance: installTrustManagerForUrlConnection() re-reads the truststore from disk and re-initializes the SSLContext on every call — should be cached/initialized once
  • Tests: Unit tests for buildInternalFeUrl() are good; no tests for the SSL client creation or the HTTPS integration path

@nsivarajan
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 29027 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit bb0743447ea301486989c5534a1dc6b4a27e2071, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17696	4435	4267	4267
q2	q3	10640	826	516	516
q4	4690	378	256	256
q5	7587	1229	1043	1043
q6	184	188	157	157
q7	806	890	681	681
q8	9331	1511	1357	1357
q9	4959	4754	4749	4749
q10	6830	1889	1652	1652
q11	490	266	247	247
q12	767	570	470	470
q13	17772	4221	3443	3443
q14	240	230	221	221
q15	964	802	801	801
q16	765	722	666	666
q17	720	842	441	441
q18	6049	5397	5228	5228
q19	1178	980	619	619
q20	534	506	396	396
q21	4761	2074	1535	1535
q22	376	335	282	282
Total cold run time: 97339 ms
Total hot run time: 29027 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4603	4587	4621	4587
q2	q3	1890	2212	1788	1788
q4	841	1192	760	760
q5	4063	4368	4462	4368
q6	194	185	142	142
q7	1748	1653	1542	1542
q8	2496	2704	2590	2590
q9	7528	7498	7385	7385
q10	2628	2852	2409	2409
q11	508	442	415	415
q12	501	604	543	543
q13	3977	4384	3557	3557
q14	274	296	283	283
q15	864	917	962	917
q16	728	778	719	719
q17	1192	1506	1370	1370
q18	7161	7016	6547	6547
q19	863	872	865	865
q20	2124	2161	2025	2025
q21	3922	3472	3468	3468
q22	449	438	381	381
Total cold run time: 48554 ms
Total hot run time: 46661 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184297 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit bb0743447ea301486989c5534a1dc6b4a27e2071, data reload: false

query5	4331	641	530	530
query6	326	219	198	198
query7	4219	465	263	263
query8	333	242	245	242
query9	8692	2713	2760	2713
query10	520	367	330	330
query11	16991	17294	17092	17092
query12	183	130	121	121
query13	1496	491	357	357
query14	6814	3350	3176	3176
query14_1	2943	2997	2899	2899
query15	203	194	177	177
query16	1070	497	481	481
query17	1115	718	668	668
query18	2915	464	380	380
query19	238	211	185	185
query20	143	138	136	136
query21	214	154	124	124
query22	5142	4980	4786	4786
query23	17224	16795	16636	16636
query23_1	16766	16646	16682	16646
query24	7281	1619	1226	1226
query24_1	1215	1246	1236	1236
query25	533	449	416	416
query26	1233	264	149	149
query27	2778	466	278	278
query28	4560	1889	1877	1877
query29	796	551	486	486
query30	308	245	210	210
query31	870	723	644	644
query32	85	74	72	72
query33	525	341	288	288
query34	940	917	568	568
query35	633	734	605	605
query36	1096	1198	999	999
query37	143	98	85	85
query38	2984	3012	2852	2852
query39	900	871	842	842
query39_1	841	830	814	814
query40	238	160	144	144
query41	68	66	66	66
query42	108	106	105	105
query43	388	384	363	363
query44	
query45	207	189	190	189
query46	881	1000	636	636
query47	2111	2135	2056	2056
query48	310	323	240	240
query49	630	474	410	410
query50	681	282	225	225
query51	4058	4039	4097	4039
query52	111	112	98	98
query53	293	359	292	292
query54	313	283	281	281
query55	98	87	83	83
query56	327	318	322	318
query57	1361	1332	1278	1278
query58	303	281	287	281
query59	2592	2714	2576	2576
query60	341	353	334	334
query61	180	173	171	171
query62	620	619	549	549
query63	321	280	292	280
query64	5058	1375	1078	1078
query65	
query66	1462	516	384	384
query67	16304	16486	16323	16323
query68	
query69	405	332	302	302
query70	1029	991	973	973
query71	341	316	307	307
query72	2800	2720	2645	2645
query73	546	552	320	320
query74	9946	9897	9778	9778
query75	2827	2778	2482	2482
query76	2315	1063	662	662
query77	360	383	313	313
query78	11141	11290	10641	10641
query79	3075	812	608	608
query80	1760	654	533	533
query81	585	282	247	247
query82	995	151	116	116
query83	332	261	245	245
query84	248	113	95	95
query85	919	476	426	426
query86	447	298	306	298
query87	3079	3121	2976	2976
query88	3560	2658	2637	2637
query89	434	373	351	351
query90	2002	175	166	166
query91	168	162	136	136
query92	80	76	73	73
query93	1386	836	510	510
query94	645	324	296	296
query95	589	346	380	346
query96	647	512	227	227
query97	2478	2523	2395	2395
query98	242	220	211	211
query99	1018	1017	886	886
Total cold run time: 256755 ms
Total hot run time: 184297 ms

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 15.73% (14/89) 🎉
Increment coverage report
Complete coverage report

@nsivarajan
Copy link
Contributor Author

@Hastyshell @dataroaring @morningman @CalvinKirs - Could you please help with review and get this merged !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants